|
Biological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases. Examples of biological data are DNA base-pair sequences, and population data used in ecology. ==Data File Formats== Each file format has been designed for specific needs and outputs in mind. * GFF * BAM * SAM * VCF * AB1 – In DNA sequencing, chromatogram files used by instruments from Applied Biosystems * ACE – A sequence assembly format * BAM – Binary compressed SAM format * BED – The browser extensible display format is used for describing genes and other features of DNA sequences * CAF – Common Assembly Format for sequence assembly * EMBL – The flatfile format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databases * FASTA – The FASTA file format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid). * FASTQ – The FASTQ file format, for sequence data with quality. Sometimes also given as QUAL. * GenBank – The flatfile format used by the NCBI to represent database records for nucleotide and peptide sequences from the GenBank and RefSeq databases * GFF – The General feature format is used for describing genes and other features of DNA, RNA and protein sequences * GTF – The Gene transfer format is used to hold information about gene structure. * NEXUS – The Nexus file encodes mixed information about genetic sequence data in a block structured format. * NWK – The Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. It is useful to hold phylogenetic trees. * PDB – structures of biomolecules deposited in Protein Data Bank. Also used for exchanging protein/nucleic acid structures. * PHD – Phred output, from the basecalling software Phred * SAM – Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released. * SCF – Staden chromatogram files used to store data from DNA sequencing * SBML – The Systems Biology Markup Language is used to store biochemical network computational models * SFF - Standard Flowgram Format * Stockholm – The Stockholm format for representing multiple sequence alignments * Swiss-Prot – The flatfile format used to represent database records for protein sequences from the Swiss-Prot database * VCF – Variant Call Format, a standard created by the 1000 Genomes Project that lists and annotates the entire collection of human variants (with the exception of approximately 1.6 million variants). 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Biological data」の詳細全文を読む スポンサード リンク
|